AITopics | kl distance

Collaborating Authors

kl distance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Overspecified Mixture Discriminant Analysis: Exponential Convergence, Statistical Guarantees, and Remote Sensing Applications

Bolatov, Arman, Legg, Alan, Melnykov, Igor, Nurlanuly, Amantay, Tezekbayev, Maxat, Assylbekov, Zhenisbek

arXiv.org Machine LearningNov-3-2025

This study explores the classification error of Mixture Discriminant Analysis (MDA) in scenarios where the number of mixture components exceeds those present in the actual data distribution, a condition known as overspecification. We use a two-component Gaussian mixture model within each class to fit data generated from a single Gaussian, analyzing both the algorithmic convergence of the Expectation-Maximization (EM) algorithm and the statistical classification error. We demonstrate that, with suitable initialization, the EM algorithm converges exponentially fast to the Bayes risk at the population level. Further, we extend our results to finite samples, showing that the classification error converges to Bayes risk with a rate $n^{-1/2}$ under mild conditions on the initial parameter estimates and sample size. This work provides a rigorous theoretical framework for understanding the performance of overspecified MDA, which is often used empirically in complex data settings, such as image and text classification. To validate our theory, we conduct experiments on remote sensing datasets.

artificial intelligence, convergence, machine learning, (15 more...)

arXiv.org Machine Learning

2510.27056

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia (0.04)
North America > United States > Minnesota > St. Louis County > Duluth (0.04)
(9 more...)

Genre: Research Report > New Finding (0.88)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.60)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Deficiency of equation-finding approach to data-driven modeling of dynamical systems

Zhai, Zheng-Meng, Lucarini, Valerio, Lai, Ying-Cheng

arXiv.org Artificial IntelligenceSep-5-2025

Department of Physics, Arizona State University, Tempe, Arizona 85287, USA (Dated: September 5, 2025) Finding the governing equations from data by sparse optimization has become a popular approach to deterministic modeling of dynamical systems. Considering the physical situations where the data can be imperfect due to disturbances and measurement errors, we show that for many chaotic systems, widely used sparse-optimization methods for discovering governing equations produce models that depend sensitively on the measurement procedure, yet all such models generate virtually identical chaotic attractors, leading to a striking limitation that challenges the conventional notion of equation-based modeling in complex dynamical systems. Calculating the Koopman spectra, we find that the different sets of equations agree in their large eigenvalues and the differences begin to appear when the eigenvalues are smaller than an equation-dependent threshold. The results suggest that finding the governing equations of the system and attempting to interpret them physically may lead to misleading conclusions. It would be more useful to work directly with the available data using, e.g., machine-learning methods. In physical science, a methodology of biblical importance is developing a quantitative model by extracting a set of governing equations from experimental data.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2509.03769

Country:

North America > United States > Arizona > Maricopa County > Tempe (0.24)
Europe > United Kingdom > England > Cambridgeshire (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Vision Transformer attention alignment with human visual perception in aesthetic object evaluation

Carrasco, Miguel, González-Martín, César, Aranda, José, Oliveros, Luis

arXiv.org Artificial IntelligenceJul-24-2025

Visual attention mechanisms play a crucial role in human perception and aesthetic evaluation. Recent advances in Vision Transformers (ViTs) have demonstrated remarkable capabilities in computer vision tasks, yet their alignment with human visual attention patterns remains underexplored, particularly in aesthetic contexts. This study investigates the correlation between human visual attention and ViT attention mechanisms when evaluating handcrafted objects. We conducted an eye-tracking experiment with 30 participants (9 female, 21 male, mean age 24.6 years) who viewed 20 artisanal objects comprising basketry bags and ginger jars. Using a Pupil Labs eye-tracker, we recorded gaze patterns and generated heat maps representing human visual attention. Simultaneously, we analyzed the same objects using a pre-trained ViT model with DINO (Self-DIstillation with NO Labels), extracting attention maps from each of the 12 attention heads. We compared human and ViT attention distributions using Kullback-Leibler divergence across varying Gaussian parameters (sigma=0.1 to 3.0). Statistical analysis revealed optimal correlation at sigma=2.4 +-0.03, with attention head #12 showing the strongest alignment with human visual patterns. Significant differences were found between attention heads, with heads #7 and #9 demonstrating the greatest divergence from human attention (p< 0.05, Tukey HSD test). Results indicate that while ViTs exhibit more global attention patterns compared to human focal attention, certain attention heads can approximate human visual behavior, particularly for specific object features like buckles in basketry items. These findings suggest potential applications of ViT attention mechanisms in product design and aesthetic evaluation, while highlighting fundamental differences in attention strategies between human perception and current AI models.

artificial intelligence, machine learning, participant, (17 more...)

arXiv.org Artificial Intelligence

2507.17616

Country: Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Reviews: Fast ε-free Inference of Simulation Models with Bayesian Conditional Density Estimation

Neural Information Processing SystemsJan-20-2025, 12:44:55 GMT

The most original part of the paper is Proposition 1, which is quite interesting. However, I have some doubts regarding the assumptions leading to formula (2). As explained in the appendix, this formula holds if q_theta is complex enough to make so that the KL distance is zero. Now, in a realistic example and with finite sample size, q_theta can't be very complex, otherwise it would over-fit. Hence, (2) holds only approximately.

bayesian conditional density estimation, posterior, simulation model, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.40)

Add feedback

Extended Grassmann Kernels for Subspace-Based Learning

Neural Information Processing SystemsApr-6-2023, 14:31:00 GMT

Subspace-based learning problems involve data whose elements are linear subspaces of a vector space. To handle such data structures, Grassmann kernels have been proposed and used previously. In this paper, we analyze the relationship between Grassmann kernels and probabilistic similarity measures. Firstly, we show that the KL distance in the limit yields the Projection kernel on the Grassmann manifold, whereas the Bhattacharyya kernel becomes trivial in the limit and is suboptimal for subspace-based problems. Secondly, based on our analysis of the KL distance, we propose extensions of the Projection kernel which can be extended to the set of affine as well as scaled subspaces.

extended grassmann kernel, projection kernel, subspace-based learning, (1 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Scaling Laws for Reward Model Overoptimization

Gao, Leo, Schulman, John, Hilton, Jacob

arXiv.org Artificial IntelligenceOct-19-2022

In reinforcement learning from human feedback, it is common to optimize against a reward model trained to predict human preferences. Because the reward model is an imperfect proxy, optimizing its value too much can hinder ground truth performance, in accordance with Goodhart's law. This effect has been frequently observed, but not carefully measured due to the expense of collecting human preference data. In this work, we use a synthetic setup in which a fixed "gold-standard" reward model plays the role of humans, providing labels used to train a proxy reward model. We study how the gold reward model score changes as we optimize against the proxy reward model using either reinforcement learning or best-of-$n$ sampling. We find that this relationship follows a different functional form depending on the method of optimization, and that in both cases its coefficients scale smoothly with the number of reward model parameters. We also study the effect on this relationship of the size of the reward model dataset, the number of reward model and policy parameters, and the coefficient of the KL penalty added to the reward in the reinforcement learning setup. We explore the implications of these empirical results for theoretical considerations in AI alignment.

arxiv preprint arxiv, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2210.1076

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hellinger KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

Roy, Arghyadip, Shakkottai, Sanjay, Srikant, R.

arXiv.org Machine LearningSep-14-2020

In the regret-based formulation of multi-armed bandit (MAB) problems, except in rare instances, much of the literature focuses on arms with i.i.d. rewards. In this paper, we consider the problem of obtaining regret guarantees for MAB problems in which the rewards of each arm form a Markov chain which may not belong to a single parameter exponential family. To achieve logarithmic regret in such problems is not difficult: a variation of standard KL-UCB does the job. However, the constants obtained from such an analysis are poor for the following reason: i.i.d. rewards are a special case of Markov rewards and it is difficult to design an algorithm that works well independent of whether the underlying model is truly Markovian or i.i.d. To overcome this issue, we introduce a novel algorithm that identifies whether the rewards from each arm are truly Markovian or i.i.d. using a Hellinger distance-based test. Our algorithm then switches from using a standard KL-UCB to a specialized version of KL-UCB when it determines that the arm reward is Markovian, thus resulting in low regret for both i.i.d. and Markovian settings.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2009.06606

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Illinois > Champaign County > Champaign (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.89)

Add feedback

Extended Grassmann Kernels for Subspace-Based Learning

Hamm, Jihun, Lee, Daniel D.

Neural Information Processing SystemsFeb-15-2020, 01:57:11 GMT

extended grassmann kernel, projection kernel, subspace-based learning, (1 more...)

Neural Information Processing Systems

Industry: Education > Focused Education > Special Education (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Structural Damage Detection and Localization with Unknown Post-Damage Feature Distribution Using Sequential Change-Point Detection Method

Liao, Yizheng, Kiremidjian, Anne S., Rajagopal, Ram, Loh, Chin-Hsuing

arXiv.org Machine LearningNov-14-2018

The high structural deficient rate poses serious risks to the operation of many bridges and buildings. To prevent critical damage and structural collapse, a quick structural health diagnosis tool is needed during normal operation or immediately after extreme events. In structural health monitoring (SHM), many existing works will have limited performance in the quick damage identification process because 1) the damage event needs to be identified with short delay and 2) the post-damage information is usually unavailable. To address these drawbacks, we propose a new damage detection and localization approach based on stochastic time series analysis. Specifically, the damage sensitive features are extracted from vibration signals and follow different distributions before and after a damage event. Hence, we use the optimal change point detection theory to find damage occurrence time. As the existing change point detectors require the post-damage feature distribution, which is unavailable in SHM, we propose a maximum likelihood method to learn the distribution parameters from the time-series data. The proposed damage detection using estimated parameters also achieves the optimal performance. Also, we utilize the detection results to find damage location without any further computation. Validation results show highly accurate damage identification in American Society of Civil Engineers benchmark structure and two shake table experiments.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1812.02824

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Consumer Health (0.49)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
(2 more...)

Add feedback

When Not to Classify: Detection of Reverse Engineering Attacks on DNN Image Classifiers

Wang, Yujia, Miller, David J., Kesidis, George

arXiv.org Machine LearningOct-31-2018

This paper addresses detection of a reverse engineering (RE) attack targeting a deep neural network (DNN) image classifier; by querying, RE's aim is to discover the classifier's decision rule. RE can enable test-time evasion attacks, which require knowledge of the classifier. Recently, we proposed a quite effective approach (ADA) to detect test-time evasion attacks. In this paper, we extend ADA to detect RE attacks (ADA-RE). We demonstrate our method is successful in detecting "stealthy" RE attacks before they learn enough to launch effective test-time evasion attacks.

artificial intelligence, classifier, machine learning, (16 more...)

arXiv.org Machine Learning

1811.02658

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback